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Abstract. - We have analyzed dynamics on the complex free energy landscape of protein 
folding in the FOLD-X model, by calculating for each state of the system the mean first passage 
time to the folded state. The resulting kinetic map of the folding process shows that it proceeds 
in jumps between well-defined, local free energy minima. Closer analysis of the different local 
minima allows us to reveal secondary, parallel pathways as well as dead ends. 



Proteins are polymers with several hundred degrees of freedom, so it is natural to think 
of protein folding as (diffusive) motion on a complex potential energy surface. This landscape 
picture has become popular in the wider protein folding community during the last decade 
and clearly invites the question of the nature of the surface: ruggedness, presence of 
kinetic traps, funnels etc. One problem with answering such questions is that the concept of 
folding rate, used to describe simple two-state kinetics, can not be generalized to a property 
that is defined for each configuration of the protein, so it is not useful for describing local 
kinetics. In this Letter, we use the inverse rate, the mean first passage time (MFPT) which 
can be calculated for each configuration 3 , to analyze the kinetics of folding. Our results 
clearly reveal kinetic barriers that can not be predicted from the one-dimensional projection 
of the free energy landscape, and allow us to construct an improved reaction coordinate. 

Workable, detailed models of protein folding energetics and kinetics do not exist. Instead, 
we work with the FOLD-X model 0], which is rather simplified, but takes experimental 
knowledge into account. FOLD-X and similar models 0EllZj are based on the observation 
that the folding rate of small proteins is correlated with the entropy cost of ordering the chain 
in a near-native geometry, implying that this happens before the transition state. Non-native 
interactions are hereby rendered less likely, and the models only consider interactions present 
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in the folded structure. In these approximations, any residue can adopt only two states; 
folded/ordered or unfolded/disordered. The conformation a folded residue is always set to its 
native state, whereas a contiguous sequence of unfolded residues is treated as a disordered 
loop. 

In FOLD-X, the total free energy, G, of a specific protein state (defined by the binary 
sequence of folded/unfolded residues) is the sum of the interactions between the atoms of the 
folded residues plus terms accounting for the loss of chain entropy: 

G = W V d W G v dw+W so i V HG so i V H+W so i v pG so i v p+Ghbond+G e i+T (W mc S mc + W SC S SC + Si oop ) 

(1) 

Here, G v d w is the sum of the van der ^Vaals contributions of all atoms of the folded residues. 
GgoivH and G so i v p is the difference in solvation energy upon folding of apolar and polar groups, 
respectively. Ghbond is the free energy difference between the formation of intra-molecular 
hydrogen-bonds compared to the inter- molecular formation of hydrogen-bonds with solvent. 
G e i is the electrostatic contribution of charged groups interactions. 5 mc and S sc are the sum 
of the entropy-loss associated with the ordering of respectively the backbone and the side- 
chain of each folded residue. Finally, Si oop is the sum of the entropy cost associated with the 
closure of the disordered loops connecting the stretches of the folded residues. The strength 
of the various interactions (G v d w , G so i v h, G so i v p, G e i) and the entropy cost for ordering a 
residue (S mc , Ssc) are scaled the atomic occupancies [T]| to take the effect of solvent expose 
into account @|. The terms W v d w , W so i v h, W so i v p, W mc and W sc in eq. Q are weighting 
factors applied to the raw energy/entropy terms. These weights have been obtained from 
a calibration against a comprehensive database of protein mutants The details of the 
different energy and entropy terms can be found in 0J|S]. 

In order to reduce the size of the state space, only states with one or two segments of 
ordered residues are considered. A comparison with an unrestricted Monte Carlo based sam- 
pling shows, that the two-segment approximation captures the thermodynamic behavior of 
small single domain proteins quite well We use the local kinetics of the energy model: in 
each step, the protein can add or remove a residue to an end of one of the ordered segments 
(including the possibility to remove a one-residue segment) or create a new one-residue seg- 
ment if zero or one ordered segments are present. The move probabilities P between states i 
and j are chosen so they fulfill the detailed balance condition 

plZ J i) = ex P("(^ - G *V fc sT) (2) 

This is achieved by using 

P(i -> j) = max(M 4 ,M J )- 1 max(l,exp(-(G :) - Gi)/k B T)) (3) 

where Mi and Mj are the number of moves available from state i and j, respectively. This 
expression clearly fulfills (0, while also ensuring that the Mi move probabilities out of state 
i add up to at most 1. The probability for remaining is P(i — * i) = 1 — J2j^a -f (* ~~ * l)- 

Since the kinetics are ergodic all trajectories starting in a particular state i will sooner or 
later end up in any other given state n. We can define the average time (number of steps) 
before this happens, the mean first passage time Ti(n). If we advance one step along these 
trajectories we clearly get one step closer to the passage through state n. Since we advance 
by the move probabilities P the MFPT obeys 

T i (n) = J2 p (i^j>j(n) + M (4) 
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Fig. 1 - Structure of a Src Homology 3 domain, PDB-file lshg. The colour coding shows different 
stages in folding in the FOLD-X model, as described in the main text. The figure was rendered by 
MolMol |T5) . 



where At is the length of the time step. For i = n the special condition r n (n) = must hold. 
If we define the matrix T(n) as follows: 

and the vector /3(n) by 

Aw-{fj£: <*> 

the condition Q and the special case for i = n can be summarized in a matrix equation for 
the MFPT vector r(n): 

[T{n) - l]r(n) = -/? (7) 

We solve this equation numerically using the linbcg sparse matrix routine and with n 
set to the completely folded state. 

We here present results from the calculation on an SH3 domain, which has 57 amino acid 
residues and a total of 425924 different states in the 2-segment approximation. The x-ray 
structure is shown in Figure (Protein Data Bank file lshg 1 1 ) . In order to present the 
results in condensed form we sum the Boltzmann probability over states with same MFPT r 
and number of ordered residues v 

T{v,t) =J2^M-Gi/kBT)5{T - Tj )5{v - Vj ) (8) 

3 

where 5 is the Kronecker delta. We also calculate the one-dimensional free energy 



G tot (v) = -k B Tlog§2T(v,T)] 

T 



(9) 



4 



EUROPHYSICS LETTERS 




Fig. 2 - Left, Upper panel: Contour plot of the Boltzmann weighted probability of different mean 
first passage times (ordinate) as function of the number of ordered residues (abscissa) in the SH3 
domain (pdb file lshg), with dark colour indicating high probability. Lower panel: Total free energy 
as function of number of folded residues. Right: Same as left but not counting the 3 N-terminal 
residues and 6 C-terminal residues. 

The upper panels in FigureEJshow contour plots of Y(v, r)/ exp(— Gt Q t{v) /hsT). The division 
with the relative probability for a given v is done in order also to show the kinetics at the top 
of the barrier. The lower panels show total free energy as function of the number of ordered 
residues, with the free energy in the unfolded state normalized to zero. 

In the left part of Figure |2] the reaction coordinate v takes ordering of all residues into 
account, as is usual with this type of model ^QEIiniEI- The free energy is seen to be a smooth 
function of the number of ordered residues, with minima at the unfolded and folded states 
and a maximum between them (the transition state). As expected, the MFPT to the folded 
state is high for states with few ordered residues and low for states with almost all residues 
ordered. Less expected is the formation of 'islands' with gaps between them. The gaps show 
the presence of particular critical events that in one step reduce the MFPT significantly. This 
suggests that the smooth variation of the free energy seen in the lower panel might be a poor 
picture of the actual energetics. 

Surprisingly, the relation between MFPT and residue ordering is not monotonous; for 
many states it is possible to find another state with fewer ordered residues but lower MFPT. 
We show in the following that this behaviour can have three reasons: 

i. Parallel folding pathways 

ii. Poor choice of reaction coordinate 

iii. Dead ends 

In order to determine the residues that must be ordered for the system to jump over a 
gap, we found for each island the states from which it is possible to jump to another island 
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Fig. 3 - Upper panel: Average ordering of residues before jumps between islands in Figure [5] For 
island 4 the average ordering is shown. Lower panel: Probability for residue to order in jump between 
islands. 

with lower MFPT, and for these states we calculated the average probability for each residue 
to be folded. The results are shown in the upper panel of Figure |2l The lower panel shows 
which residues on average become ordered right in the jump between islands. 

Passage from island la or lb to island 2 is seen to involve ordering of residues 32-57, 
which constitute most of the three strands of the main /3-sheet. This finding is consistent with 
earlier folding studies of the SH3-domain [T2Tll3lll4| . In fact, experimental studies [T7HT%| also 
demonstrate that the diffusive contribution from the unfolded residues (here 6-31) may play 
an important role in driving the initial folding event. Accounting for this contribution falls 
outside the scope of the present model. 

Interestingly, the map shows that the initial folding event can happen by two parallel 
pathways: the dominant path (island lb) is formation of the last two strands (cyan in Figure 
^1 followed by the first (blue in Figure^), but a small fraction (island la, ^3.5 %) initially form 
the first two strands followed by the last. It can be seen from the free energy diagram that 
these events happen around the top of the barrier, i. e. at the transition state. Subsequently, 
the part of the protein rendered in violet folds followed by the remainder, shown in red. 

It is seen that for all islands ordering of the C-terminus is possible, but not necessary for 
jumping to the next island with shorter MFPT. This shows that folding can proceed with or 
without ordering of these residues, i.e. their ordering represent modes which are orthogonal 
to the reaction coordinate. In the island with the completely folded state (island 4) the N- 
terminus is also partially disordered. Inclusion of these residues will smear out both MFPT 
and Gtot plots in the ^-direction. We therefore plotted the data again but not counting- 
ordering of 3 N-terminal and 6 C-terminal residues in the reaction coordinate. 

The results are shown in the right part of Figure [5] The features of both MFPT and 
free energy plots are significantly sharper: the overlaps in the v direction between islands is 
diminished, and G to t is seen to have several local maxima. This explains the gaps in the MFPT 
plots: rather than a going over a smooth barrier, folding in fact proceeds through a number 
of local minima and the rate is determined by passage of the barriers in between. We have 
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performed the identical analysis on chymotrypsin inhibitor 2 (PDB-file 2ci2) and obtained 
very similar results, except that the gaps between the islands were even more pronounced. 
Note that from the free energy curve in the right part of Figure [21 one would still predict 
experimental two-state folding, as is also observed |12I13U14| . The islands are not predicted to 
represent experimental folding intermediates, since the local free energy minima in all cases 
are significantly higher than the free energy of both the denatured and native states. 

Even with the improved reaction coordinate there is still a considerable overlap in the 
^-direction between islands 3 and 4; there are states with 36 ordered residues and a MFPT 
to the folded state above 2xl0 5 steps, and other states with only 30 ordered residues and 
MFPT below 2xl0 4 steps. It turns out that the states with 30-36 ordered residues and low 
MFPT to the folded state in fact represent a dead end that is not in direct kinetic contact with 
island 3. Calculation of the MFPT to the unfolded state shows that these states have longer 
MFPT than even the completely folded state, meaning that they must first refold before they 
can unfold correctly (data not shown). What happens is that in these states the loop 16-26 
disorders while the residues 9-15 stay in place. Apparently, this is energetically just as favored 
as disordering from the N-terminus, but the barrier for further disordering along this route is 
higher, and so the protein instead refolds the loop and starts unfolding at the N-terminus. The 
kinetic connections are the same in both directions (microscopic reversibility) and so these 
states that can not directly unfold are also not reached on the folding pathway. 

In conclusion, our results show that solution of the mean first passage time equation 
provides a new roadmap to protein folding landscapes. The landscape turns out to be rugged 
rather than smooth, secondary pathways and dead ends reveal themselves and it becomes 
possible to pinpoint the exact residues involved in barrier passage. The present work employs 
realistic free energies and a simplified representation of the ordering dynamics, but we believe 
that this type of approach should be valuable also in the analysis of models with realistic 
descriptions of structural change. 
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